The Leader Election Problem

2021-01-14

字数统计: 759字 | 阅读时长≈ 4分

The Leader Election Problem is classical and has been useful in many variants of systems, including cloud computing systems.

What is leader ?

leader, in the context of solving distributed system problem, refers to a server who is responsible for receiving all reads and writes. It is useful for coordination among distributed systems.

Leader detection problem

In a group of processes, elect a leader and let everyone know in the group about this leader.

When a leader fails, by using failure detector, some process will detect this. The focus of election algorithm is to ensure:

1. Elect one leader only among the non-failed processes.

2. All non-failed processes in the group agree on the (new) leader.

Who call for an election?

Any process can call (at most one at a time) for a election. Multiple processes are allowed to call simultaneously, and all of them should together yield one leader. And the result of leader does not depend on which process calls for it.

In the end, the non-faulty process with the best election attribute value is elected.

Ring Leader Election

N processes are organized in a logical ring (similar to chord in P2P system).

In the ring, every process has a successor. If a process finds the leader/coordinator fails, it initiates an “Election” message containing its own id (election attribute here) and send to the successor.

When a process receives a “Election” message, if greater than its own attribute then forwards it, otherwise overwrites the message with its own attribute. If one process find its attribute matches the the attribute in message, then it must the best one and become new leader/coordinator.

Then, the new coordinator will send “Elected” message to the all other processes. The processes receiving will be marked as elected (agree on new leader).

Time complexity: O(n), best case: 2n, worst case: 3n - 1

When multiple processes call for election, the result is hold.

But if the would-be leader fails after sending “Elected” message, the liveness is violated.

Some solutions for this condition:

1. the would-be leader will be monitored by its neighbor, and its neighbor should call for a new election when timeout waiting for elected. This solution can fail when the neighbor is also failed (and the neighbor of neighbor also fails).

2. Each process after receiving the “Election” message should detect failure on would-be leader using its local failure detector.

Paxos-like solution in Google Chubby

During each election, each process votes for only once. And any potential should reach the quorum (majority) to get elected. Once a leader is elected, other processes promise to consider it leader for a minimum time, after which, if the leader remains alive, it can renew the master lease and thus remain leader (After election finishes, other servers will not run election again for a “while”) .

Paxos-like solution in Zookeeper

Each server creates a new sequence number for itself. The leader will be the server with highest number. For failure, each server monitors its next higher number process.

If its successor is the leader and it has failed, it become the new leader, else waiting for time out to check again.

It has two phase:

send NEW_LEADER to everyone, and wait for majority of other processes sending ACK.
send COMMIT and all of processes should updates its leader this time.

This strategy may not succeed if:

Multiple sends NEW_LEADER, and no one receives majority of ACK;
Exactly one sends NEW_LEADER，but some ACK get dropped.
Exactly one sends NEW_LEADER，but some of them get dropped.

Bully Algorithm

All processes know other processes’ id.

When leader fails, if one find itself to be new leader(highest id), it sends out “Coordinator” message to others. If it is not, initiates an election by sending “Election” only to those greater than itself. After initiate, if it receives no answer （OK message）within timeout, send “Coordinator” message again, else it wait for “Coordinator” message. It no “Coordinator” received within timeout, start new round.

In this approach, we eventually come up with a leader.

Time Complexity: O(N^2) = N -1 + N - 2+ … +2 + 1

Example:

failure detected by N6